
What happens when you set out on a journey of curiosity to teach a language model to write haikus? You discover that fine-tuning is less about fancy models and more about debugging datasets, chasing strange failures, and running hundreds of experiments. In this talk, I'll share my journey fine-tuning Gemma-3 with MLX on Apple Silicon—from outputs that looked like ancient Sumerian inscriptions to a model that crossed 55% accuracy. We'll explore dataset engineering, LoRA tuning, evaluation loops, tokenization challenges, and the practical lessons learned along the way.